La Crosse County
Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling
Pope, Sophie C., Barovic, Andrew, Moin, Armin
This study explores a novel approach to predicting key bug-related outcomes, including the time to resolution, time to fix, and ultimate status of a bug, using data from the Bugzilla Eclipse Project. Specifically, we leverage features available before a bug is resolved to enhance predictive accuracy. Our methodology incorporates sentiment analysis to derive both an emotionality score and a sentiment classification (positive or negative). Additionally, we integrate the bug's priority level and its topic, extracted using a BERTopic model, as features for a Convolutional Neural Network (CNN) and a Multilayer Perceptron (MLP). Our findings indicate that the combination of BERTopic and sentiment analysis can improve certain model performance metrics. Furthermore, we observe that balancing model inputs enhances practical applicability, albeit at the cost of a significant reduction in accuracy in most cases. To address our primary objectives, predicting time-to-resolution, time-to-fix, and bug destiny, we employ both binary classification and exact time value predictions, allowing for a comparative evaluation of their predictive effectiveness. Results demonstrate that sentiment analysis serves as a valuable predictor of a bug's eventual outcome, particularly in determining whether it will be fixed. However, its utility is less pronounced when classifying bugs into more complex or unconventional outcome categories.
- North America > United States > Wisconsin > La Crosse County > La Crosse (0.14)
- North America > United States > Texas > Harris County > Spring (0.04)
- North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
LLM-Match: An Open-Sourced Patient Matching Model Based on Large Language Models and Retrieval-Augmented Generation
Li, Xiaodi, Chowdhury, Shaika, Wi, Chung Il, Vassilaki, Maria, Liu, Ken, Sio, Terence T, Garrick, Owen, Juhn, Young J, Cerhan, James R, Tao, Cui, Zong, Nansu
Patient matching is the process of linking patients to appropriate clinical trials by accurately identifying and matching their medical records with trial eligibility criteria. We propose LLM-Match, a novel framework for patient matching leveraging fine-tuned open-source large language models. Our approach consists of four key components. First, a retrieval-augmented generation (RAG) module extracts relevant patient context from a vast pool of electronic health records (EHRs). Second, a prompt generation module constructs input prompts by integrating trial eligibility criteria (both inclusion and exclusion criteria), patient context, and system instructions. Third, a fine-tuning module with a classification head optimizes the model parameters using structured prompts and ground-truth labels. Fourth, an evaluation module assesses the fine-tuned model's performance on the testing datasets. We evaluated LLM-Match on four open datasets - n2c2, SIGIR, TREC 2021, and TREC 2022 - using open-source models, comparing it against TrialGPT, Zero-Shot, and GPT-4-based closed models. LLM-Match outperformed all baselines.
- North America > United States > Minnesota > Olmsted County > Rochester (0.05)
- North America > United States > Wisconsin > La Crosse County > La Crosse (0.04)
- Asia > Taiwan > Taiwan > Taipei (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
WavePulse: Real-time Content Analytics of Radio Livestreams
Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New York > Kings County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (215 more...)
- Media > Radio (1.00)
- Leisure & Entertainment (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Bi-Filtration and Stability of TDA Mapper for Point Cloud Data
Carlsson, Singh and Memoli's TDA mapper takes a point cloud dataset and outputs a graph that depends on several parameter choices. Dey, Memoli, and Wang developed Multiscale Mapper for abstract topological spaces so that parameter choices can be analyzed via persistent homology. However, when applied to actual data, one does not always obtain filtrations of mapper graphs. DBSCAN, one of the most common clustering algorithms used in the TDA mapper software, has two parameters, \textbf{$\epsilon$} and \textbf{MinPts}. If \textbf{MinPts = 1} then DBSCAN is equivalent to single linkage clustering with cutting height \textbf{$\epsilon$}. We show that if DBSCAN clustering is used with \textbf{MinPts $>$ 2}, a filtration of mapper graphs may not exist except in the absence of free-border points; but such filtrations exist if DBSCAN clustering is used with \textbf{MinPts = 1} or \textbf{2} as the cover size increases, \textbf{$\epsilon$} increases, and/or \textbf{MinPts} decreases. However, the 1-dimensional filtration is unstable. If one adds noise to a data set so that each data point has been perturbed by a distance at most \textbf{$\delta$}, the persistent homology of the mapper graph of the perturbed data set can be significantly different from that of the original data set. We show that we can obtain stability by increasing both the cover size and \textbf{$\epsilon$} at the same time. In particular, we show that the bi-filtrations of the homology groups with respect to cover size and $\epsilon$ between these two datasets are \textbf{2$\delta$}-interleaved.
- North America > United States > Wisconsin > La Crosse County > La Crosse (0.04)
- North America > United States > New Jersey (0.04)
- North America > United States > Iowa (0.04)
- (2 more...)
Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC) -- an end-to-end model for characterizing severity and diagnosis
Santos, Thiago, Kamath, Harish, McAdams, Christopher R., Newell, Mary S., Mosunjac, Marina, Oprea-Ilies, Gabriela, Smith, Geoffrey, Lehman, Constance, Gichoya, Judy, Banerjee, Imon, Trivedi, Hari
Automated classification of cancer pathology reports can extract information from unstructured reports and categorize each report into structured diagnosis and severity categories. Thus, such system can reduce the burden for populating tumor registries, help registration for clinical trial as well as developing large dataset for deep learning model development using true pathologic ground truth. However, the content of breast pathology reports can be difficult for categorize due to the high linguistic variability in content and wide variety of potential diagnoses >50. Existing NLP models are primarily focused on developing classifier for primary breast cancer types (e.g. IDC, DCIS, ILC) and tumor characteristics, and ignore the rare diagnosis of cancer subtypes. We then developed a hierarchical hybrid transformer-based pipeline (59 labels) - Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC), which utilizes the potential of the transformer context-preserving NLP technique and compared our model to several state of the art ML and DL models. We trained the model on the EUH data and evaluated our model's performance on two external datasets - MGH and Mayo Clinic. We publicly release the code and a live application under Huggingface spaces repository
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Oceania > Australia (0.04)
- (10 more...)
Sketching Robot Programs On the Fly
Porfirio, David, Stegner, Laura, Cakmak, Maya, Sauppé, Allison, Albarghouthi, Aws, Mutlu, Bilge
Service robots for personal use in the home and the workplace require end-user development solutions for swiftly scripting robot tasks as the need arises. Many existing solutions preserve ease, efficiency, and convenience through simple programming interfaces or by restricting task complexity. Others facilitate meticulous task design but often do so at the expense of simplicity and efficiency. There is a need for robot programming solutions that reconcile the complexity of robotics with the on-the-fly goals of end-user development. In response to this need, we present a novel, multimodal, and on-the-fly development system, Tabula. Inspired by a formative design study with a prototype, Tabula leverages a combination of spoken language for specifying the core of a robot task and sketching for contextualizing the core. The result is that developers can script partial, sloppy versions of robot programs to be completed and refined by a program synthesizer. Lastly, we demonstrate our anticipated use cases of Tabula via a set of application scenarios.
- North America > United States > New York > New York County > New York City (0.15)
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Wisconsin > La Crosse County > La Crosse (0.14)
- (17 more...)
- Government (0.46)
- Education (0.46)
AI reduces miss rate of precancerous polyps in colorectal cancer screening
Most colon polyps are harmless, but some over time develop into colon or rectal cancer, which can be fatal if found in its later stages. Colorectal cancer is the second most deadly cancer in the world, with an estimated 1.9 million cases and 916,000 deaths worldwide in 2020, according to the World Health Organization. A colonoscopy is an exam used to detect changes or abnormalities in the large intestine (colon) and rectum. Between February 2020 and May 2021, 230 study participants each underwent two back-to-back colonoscopies on the same day at eight hospitals and community clinics in the U.S., U.K. and Italy. One colonoscopy used AI; the other, a standard colonoscopy, did not. The rate at which precancerous colorectal polyps is missed has been estimated to be 25%.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.19)
- North America > United States > Florida > Duval County > Jacksonville (0.14)
- North America > United States > Wisconsin > La Crosse County > La Crosse (0.10)
- (4 more...)
- Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Machine Bias: There's Software Used Across the Country to Predict Future Criminals. And it's Biased Against Blacks.
On a spring afternoon in 2014, Brisha Borden was running late to pick up her god-sister from school when she spotted an unlocked kid's blue Huffy bicycle and a silver Razor scooter. Borden and a friend grabbed the bike and scooter and tried to ride them down the street in the Fort Lauderdale suburb of Coral Springs. Just as the 18-year-old girls were realizing they were too big for the tiny conveyances -- which belonged to a 6-year-old boy -- a woman came running after them saying, "That's my kid's stuff." Borden and her friend immediately dropped the bike and scooter and walked away. But it was too late -- a neighbor who witnessed the heist had already called the police. Borden and her friend were arrested and charged with burglary and petty theft for the items, which were valued at a total of 80. Compare their crime with a similar one: The previous summer, 41-year-old Vernon Prater was picked up for shoplifting 86.35 worth of tools from a nearby Home Depot store. Prater was the more seasoned criminal. He had already been convicted of armed robbery and attempted armed robbery, for which he served five years in prison, in addition to another armed robbery charge. Borden had a record, too, but it was for misdemeanors committed when she was a juvenile.
- North America > United States > Florida > Broward County > Fort Lauderdale (0.25)
- North America > United States > Virginia (0.05)
- North America > United States > Kentucky (0.04)
- (14 more...)
The Legal System Uses an Algorithm to Predict If People Might Be Future Criminals. It's Biased Against Blacks.
On a spring afternoon in 2014, Brisha Borden was running late to pick up her god-sister from school when she spotted an unlocked kid's blue Huffy bicycle and a silver Razor scooter. Borden and a friend grabbed the bike and scooter and tried to ride them down the street in the Fort Lauderdale suburb of Coral Springs. Just as the 18-year-old girls were realizing they were too big for the tiny conveyances--which belonged to a 6-year-old boy--a woman came running after them saying, "That's my kid's stuff." Borden and her friend immediately dropped the bike and scooter and walked away. But it was too late--a neighbor who witnessed the heist had already called the police. Borden and her friend were arrested and charged with burglary and petty theft for the items, which were valued at a total of 80. Compare their crime with a similar one: The previous summer, 41-year-old Vernon Prater was picked up for shoplifting 86.35 worth of tools from a nearby Home Depot store. Prater was the more seasoned criminal. He had already been convicted of armed robbery and attempted armed robbery, for which he served five years in prison, in addition to another armed robbery charge. Borden had a record, too, but it was for misdemeanors committed when she was a juvenile.
- North America > United States > Florida > Broward County > Fort Lauderdale (0.25)
- North America > United States > Virginia (0.05)
- North America > United States > Kentucky (0.04)
- (14 more...)
Complexity of Self-Preserving, Team-Based Competition in Partially Observable Stochastic Games
Allen, Marty (University of Wisconsin-La Crosse)
Partially observable stochastic games (POSGs) are a robust and precise model for decentralized decision making under conditions of imperfect information, and extend popular Markov decision problem models. Complexity results for a wide range of such problems are known when agents work cooperatively to pursue common interests. When agents compete, things are less well understood. We show that under one understanding of rational competition, such problems are complete for the class NEXP^NP. This result holds for any such problem comprised of two competing teams of agents, where teams may be of any size whatsoever.
- North America > United States > Wisconsin > La Crosse County > La Crosse (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)